Search CORE

37 research outputs found

Utilising Tree-Based Ensemble Learning for Speaker Segmentation

Author: J. Ajmera
J. Rong
M. Grašič
S.S. Cheng
X. Anguera Miro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Part 2: Learning-Ensemble LearningInternational audienceIn audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used ones as they proved to be very effective for such task. The main criticism levelled against BIC-based approaches is the use of a penalty parameter in the BIC function. The use of this parameters consequently means that a fine tuning is required for each variation of the acoustic conditions. When tuned for a certain condition, the model becomes biased to the data used for training limiting the model’s generalisation ability.In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant left and right segments. The tree is constructed where each node corresponds to the highest ΔBIC with the associated point index. After building the forest and using all trees, the accumulated ΔBIC for each point is calculated and the positions of the local maximums are considered as speaker changing points. The proposed approach is tested on artificially created conversations from the TIMIT database. The approach proposed show very accurate results comparable to those achieved by the-state-of-the-art methods with a 9% (absolute) higher F1 compared with the standard ΔBIC with optimally tuned penalty parameter

Crossref

VBN

A Contextual Study of Semantic Speech Editing in Radio Production

Author: Anguera Miro
Arons
Baume
Bell
Berthouzoz
Burke
Casares
Choi
Engström
Hart
Kim
Klemmer
Long
Loviscach
Matsuo
Nielsen
Panagiotakis
Perry
Rubin
Shin
Sivaraman
Suhm
Sun
Vemuri
Weibel
Whittaker
Whittaker
Whittaker
Wolfe
Yoon
Publication venue: 'Elsevier BV'
Publication date: 01/07/1997
Field of study

Radio production involves editing speech-based audio using tools that represent sound using simple waveforms. Semantic speech editing systems allow users to edit audio using an automatically generated transcript, which has the potential to improve the production workflow. To investigate this, we developed a semantic audio editor based on a pilot study. Through a contextual qualitative study of five professional radio producers at the BBC, we examined the existing radio production process and evaluated our semantic editor by using it to create programmes that were later broadcast. We observed that the participants in our study wrote detailed notes about their recordings and used annotation to mark which parts they wanted to use. They collaborated closely with the presenter of their programme to structure the contents and write narrative elements. Participants reported that they often work away from the office to avoid distractions, and print transcripts so they can work away from screens. They also emphasised that listening is an important part of production, to ensure high sound quality. We found that semantic speech editing with automated speech recognition can be used to improve the radio production workflow, but that annotation, collaboration, portability and listening were not well supported by current semantic speech editing systems. In this paper, we make recommendations on how future semantic speech editing systems can better support the requirements of radio production

Crossref

Surrey Research Insight

eCommons@Cornell

Recommended from our members

Is voice a marker for autism spectrum disorder? A systematic review and meta-analysis

Author: Alden
Amorosa
Anguera
Asgari
Asperger
Baltaxe
Baltaxe
Banse
Bishop
Bone
Bone
Bonneh
Boucher
Boucher
Bourgondien
Brisson
Bryant
Chan
Cochran
Cohen
Cruys
Cummins
Dairoku
Dale
Degottex
Depape
Diehl
Diehl
Diehl
Eadie
Ellis
Feldstein
Field
Filipe
Fine
Forbes-Riley
Fosnot
Fusaroli
Fusaroli
Fusaroli
Fusaroli
Fusaroli
Fusaroli
Fusaroli
Goldfarb
Goldfarb
Green
Grossman
Hastie
Higgins
Hopkins
Hubbard
Jiang
Järvinen-Pasley
Kakihara
Kaland
Kiss
Klin
Klopfenstein
Lambrechts
Laver
Liscombe
Lord
Lord
Marchi
Marwan
Maryn
McCann
Michael
Miro
Morett
Mushin
Nadig
Nakai
Oller
Orlikoff
Paccia
Palmer
Parish-Morris
Paul
Paul
Paul
Paul
Pickering
Pronovost
Quigley
Quintana
Riley
Rodriguez
Rogers
Ruggeri
Santos
Scharfstein
Sharda
Sheinkopf
Shriberg
Shriberg
Simmons
Slocombe
Thurber
Titze
Travis
Tsanas
Viechtbauer
Vosoughi
Wallace
Warlaumont
Yarkoni
Publication venue: 'Wiley'
Publication date: 03/04/2016
Field of study

Individuals with Autism Spectrum Disorder (ASD) tend to show distinctive, atypical acoustic patterns of speech. These behaviours affect social interactions and social development and could represent a non-invasive marker for ASD. We systematically reviewed the literature quantifying acoustic patterns in ASD. Search terms were: (prosody OR intonation OR inflection OR intensity OR pitch OR fundamental frequency OR speech rate OR voice quality OR acoustic) AND (autis* OR Asperger). Results were filtered to include only: empirical studies quantifying acoustic features of vocal production in ASD, with a sample size > 2, and the inclusion of a neurotypical comparison group and/or correlations between acoustic measures and severity of clinical features. We identified 34 articles, including 30 univariate studies and 15 multivariate machine-learning studies. We performed metaanalyses of the univariate studies, identifying significant differences in mean pitch and pitch range between individuals with ASD and comparison participants (Cohen’s d of 0.4-0.5 and discriminatory accuracy of about 61-64%). The multivariate studies reported higher accuracies than the univariate studies (63-96%). However, the methods used and the acoustic features investigated were too diverse for performing meta-analysis. We conclude that multivariate studies of acoustic patterns are a promising but yet unsystematic avenue for establishing ASD markers. We outline three recommendations for future studies: open data, open methods, and theory-driven research

City Research Online

Crossref

UCL Discovery

Audio source separation into the wild

Author: Aichner
Anguera Miro
Araki
Araki
Arberet
Arberet
Arberet
Attias
Avargel
Avargel
Badeau
Benaroya
Benesty
Bertrand
Bertrand
Bishop
Bustamante
Cardoso
Cemgil
Chazan
Chazan
Cherkassky
Cook
Cox
Crochiere
Dempster
DiBiase
Dillon
Doclo
Doclo
Drude
Duong
Duong
Dvorkind
Evers
Evers
Fallon
Feng
Févotte
Févotte
Gannot
Gannot
Gannot
Gilloire
Girgis
Girin
Habets
Hadad
Hershey
Higuchi
Higuchi
Higuchi
Hild
Hori
Ikram
Kamkar-Parsi
Kleijn
Kounades-Bastian
Kounades-Bastian
Kounades-Bastian
Kounades-Bastian
Kounades-Bastian
Koutras
Kowalski
Kuttruff
Laufer
Lee
Leglaive
Leglaive
Leglaive
Li
Li
Li
Li
Liutkus
Loesch
Loizou
Luo
Lyon
Löllmann
Ma
Malik
Mandel
Markovich
Markovich-Golan
Markovich-Golan
Markovich-Golan
Markovich-Golan
Marquardt
Mitianoudis
Mukai
Nakadai
Nakadai
Narayanan
Nesta
Nugraha
O'Connor
O'Grady
Ozerov
Ozerov
Ozerov
Parra
Parra
Parsons
Pedersen
Pertilä
Plumbley
Prieto
Roman
Roman
Sawada
Sawada
Schmid
Schmidt
Schwartz
Schwartz
Schwartz
Simon
Smaragdis
Sturmel
Talmon
Talmon
Thiergart
Thiergart
Valin
Van Trees
Vijayasenan
Vincent
Vincent
Wang
Wang
Wang
Wang
Warsitz
Wehr
Weinstein
Widrow
Winter
Yilmaz
Yoshioka
Zeng
Zhang
Publication venue: 'Elsevier BV'
Publication date: 16/11/2018
Field of study

International audienceThis review chapter is dedicated to multichannel audio source separation in real-life environment. We explore some of the major achievements in the field and discuss some of the remaining challenges. We will explore several important practical scenarios, e.g. moving sources and/or microphones, varying number of sources and sensors, high reverberation levels, spatially diffuse sources, and synchronization problems. Several applications such as smart assistants, cellular phones, hearing aids and robots, will be discussed. Our perspectives on the future of the field will be given as concluding remarks of this chapter

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Action of Ammonium and Sodium Hydroxides on Keratin Fibres in Relation to their Morphological Structure

Author: Anguera
Asquith
Asquith
Asquith
Asquith
Asquith
Caldwell
Corfield
Garcia-Dominguez
Miro
Miro
Miro
Miro
Miro
Zahn
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Reaction of Wool with Alkylamines Possible Technique for Producing Differential-dyeing Wool

Author: Anguera
Asquith
Asquith
Crorfield
Garcia-Dontinguez
Idem
Idem
Idem
Ingold
Miro
Miro
Miro
Miro
Tarbell
Zahan
Zahn
Ziegler
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Speaker Diarization: A Review of Recent Research

Author: C. Fredouille
G. Friedland
N. Evans
O. Vinyals
S. Bozonnet
Xavier Anguera Miro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

The ICSI RT-09 Speaker Diarization System

Author: Anguera Miro X.
Friedland G.
Gottlieb L.
Huijbregts M.A.H.
Imseng D.
Janin A.
Knox M.T.
Vinyals O.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Item does not contain fulltext11 p

Infoscience - École polytechnique fédérale de Lausanne

Radboud Repository

Speaker clustering quality estimation with logistic regression

Author: Ajmera
Anguera Miro
Ben-Harush
Bezdek
Bishop
Bolshakova
Chen
Cohen
Cohen
Cohen
Comaniciu
Figueiredo
Fukunaga
Hansen
Jain
Kohonen
Lapidot
Pal
Salmun
Senoussaoui
Shapiro
Tibshirani
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Tools for multimodal annotation

Author: H Sacks
J Wells
JW Bois Du
M Bert
ME Beckman
P Boersma
R Barnett
S Bird
S Cassidy
S Young
SE Tranter
T John
X Anguera Miro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/10/2017
Field of study

Researchers interested in the sounds of speech or the physical gestures of Speakers make use of audio and video recordings in their work. Annotating these recordings presents a different set of requirements to the annotation of text. Special purpose tools have been developed to display video and audio Signals and to allow the creation of time-aligned annotations. This chapter reviews the most widely used of these tools for both manual and automatic generation of annotations on multimodal data

Crossref

Publikationsserver des Instituts für Deutsche Sprache